A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
نویسندگان
چکیده
This paper introduces the Multi-Genre Natural Language Inference (MultiNLI) corpus, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding. In addition to being one of the largest corpora available for the task of NLI, at 433k examples, this corpus improves upon available resources in its coverage: it offers data from ten distinct genres of written and spoken English—making it possible to evaluate systems on nearly the full complexity of the language—and it offers an explicit setting for the evaluation of crossgenre domain adaptation.
منابع مشابه
Data-driven design of a sentence list for an articulatory speech corpus
Articulatory data offers promising developments in our understanding of speech production and advances in speech technologies. However, it is more expensive and difficult to obtain than audio data, which means data collection must be carefully planned. This paper presents a method for designing an articulatory speech corpus comparable to the widely-used TIMIT corpus, for languages other than En...
متن کاملUnderstanding Mental States in Natural Language
Understanding mental states in narratives is an important aspect of human language comprehension. By “mental states” we refer to beliefs, states of knowledge, points of view, and suppositions, all of which may change over time. In this paper, we propose an approach for automatically extracting and understanding multiple mental states in stories. Our model consists of two parts: (1) a parser tha...
متن کاملBeauty and the Beast: What running a broad-coverage precision grammar over the BNC taught us about the grammar — and the corpus
Introduction Typically, broad-coverage precision grammars are based on grammaticality judgment data and syntactic intuition, and corpus data is relegated to secondary status in guiding lexicon and grammar development. On the other end of the scale, shallow grammars are often induced directly from treebank data and make little or no use of grammaticality judgments or intuition. This tends to cau...
متن کاملA Model of Language Processing as Hierarchic Sequential Prediction
Computational models of memory are often expressed as hierarchic sequence models, but the hierarchies in these models are typically fairly shallow, reflecting the tendency for memories of superordinate sequence states to become increasingly conflated. This article describes a broad-coverage probabilistic sentence processing model that uses a variant of a left-corner parsing strategy to flatten ...
متن کاملAn Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator
This paper describes a general-purpose sentence generation system that can achieve both broad scale coverage and high quality while aiming to be suitable for a variety of generation tasks. We measure the coverage and correctness empirically using a section of the Penn Treebank corpus as a test set. We also describe novel features that help make the generator flexible and easier to use for a var...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1704.05426 شماره
صفحات -
تاریخ انتشار 2017